Fix OTTERS lassosum selector parity#482
Merged
danielnachun merged 5 commits intoStatFunGen:mainfrom Apr 24, 2026
Merged
Conversation
9f378b1 to
e01db04
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes the OTTERS lassosum regression by replacing the default OTTERS selector with the LD-quadratic pseudovalidation score:
score(beta) = (c^T beta) / sqrt(beta^T R beta)
where:
This also removes the earlier genotype-format-specific selector patch. min(fbeta) is kept only as an explicit debug option.
Root Cause
Old OTTERS did not select lassosum models by min(fbeta). It fit the beta path and then used lassosum pseudovalidation to choose the final (s, lambda).
The refactor changed that selector to min(fbeta), and the OTTERS wrapper also double-scaled the lassosum input before it reached the low-level solver.
Fixture 206 isolates the selector bug cleanly:
Published lassosum selected s = 0.2, lambda = 1e-4, while min(fbeta) selected s = 1, lambda = 1e-4 on the same grid. This is not a grid-definition problem. It is a selector
regression.
Mathematical Rationale
Old pseudovalidation can be written as:
scaled_beta = beta / sd
pred = X * scaled_beta
score = (c^T beta) / sqrt(Var(pred))
After centering and standardizing the reference matrix columns by the same per-variant scale, this becomes:
score(beta) = (c^T beta) / sqrt(beta^T R beta)
So the selector can be evaluated directly from summary-statistics correlation and LD, without using genotype explicitly.
Validation
PLINK1 source: genotype matrix vs LD-quadratic
The LD-quadratic score matches PLINK1 genotype pseudovalidation essentially exactly.
This validates the selector formula itself.
Sketch source: sample matrix vs LD-quadratic
For the sketch source, the sample-matrix pseudovalidation and the LD-quadratic score are the same numeric object once both are built from the same restored sketch matrix and the
same column standardization.
So the remaining mismatch is not between sample-matrix pseudovalidation and quadratic LD scoring. It is between the current sketch-derived standardized LD path and the PLINK1/
genotype-backed standardized LD path.
What This PR Changes
R/regularized_regression.R
R/otters.R